Goto

Collaborating Authors

 audio recording


Uber passengers can now make audio recordings of their journey if they feel unsafe

Daily Mail - Science & tech

Moment Dame Helen Mirren is called an'evil Zionist b****' as she is accosted by pro-Palestine stranger on London street'Hell on Wheels' teen Mackenzie Shirilla's diva demands and disturbing obsession with fame revealed in prison calls with mom Girl, 14, was enjoying evening walk through her leafy Midwest neighborhood... then a stranger in a black car pulled up alongside her and horror ensued Scandalous underbelly of America's new high-stakes obsession: Secret backroom games, brazen cheating allegations... and savage public humiliations I know the devil, he's far more terrifying than in the movies... you can feel his power He became a MAGA star at Trump rallies dressed as the border wall... find out what happened to'Brick Suit Guy' in the free DC Insider newsletter Rich Christians in the'Hamptons of South' are turning on their new neighbor - beach-baptizer and MAGA convert Russell Brand Hugh Jackman's girlfriend Sutton Foster admits she feels'really alone' after she was pictured looking tense with actor and says'women shouldn't be pitted against one another' amid ongoing comparisons to his ex-wife Naomi Osaka doubles down with new French Open'fashion show', despite infuriating opponent, as she adds an ivory train to her'problematic' Eiffel Tower dress as part of £7.5m Nike deal Every man I date has the same vile bedroom kink... it's a total turn off, but I keep saying yes: DEAR JANE Russia's tactics in Ukraine reach a new hellish low as troops are forced to crawl for miles through underground pipes - with a life expectancy of ten minutes at the other end Our perfect summer body secrets: We've found the ultimate shortcut to the'after' photo... and the easy '30:30' diet that sparked a 22-pound transformation Triumphant Trump nominee's bold statement: Cheater Ken Paxton struts out in Margaritaville mode as secrets of his love nest with mistress are exposed Iran attacks US airbase after Trump condemns Tehran's peace plan and strikes regime drone site near Strait of Hormuz Kim Kardashian is introduced to Lewis Hamilton's mother Carmen Larbalestier as new couple dine out with their families in Los Angeles Trump's DHS chief rocked by wild rumor about his WIFE... as furious staff leak scandalous details about his life of luxury Meghan Markle adds luxury matchboxes to As Ever product range as she reveals'limited edition' item will be part of £190 candle set How I dropped from 17.5st to 10st WITHOUT getting loose, saggy skin. So many women struggle with unsightly wrinkles and flapping folds left by extreme weight loss. Here's how to avoid them Uber is making a major update to improve safety for millions of passengers in the UK. Riders will now be able to make audio recordings of their journey through the Uber app if they feel unsafe. Users can activate the feature either before or during the trip and start recording at any point with the press of a button.


'Creepy' Listening Tool for Targeted Ads Didn't Actually Work, FTC Says

WIRED

'Creepy' Listening Tool for Targeted Ads Didn't Actually Work, FTC Says Three firms will pay nearly $1 million for selling "Active Listening" technology that they claimed tapped people's phones for advertising. The FTC alleges the "tech" was just pricey email lists. The Federal Trade Commission announced on Thursday that Cox Media Group and two other marketing companies, MindSift LLC and 1010 Digital Works, have agreed to collectively pay nearly $1 million to settle allegations that they deceived their customers--other businesses--by claiming that they could help target ads based on audio recordings collected from consumers' smart devices via a marketing service called Active Listening. In a statement to WIRED, a spokesperson for CMG says, "We are pleased to have this matter resolved. Our local marketing team relied on marketing materials provided to us by a third-party vendor about their product. We withdrew the materials expeditiously and stopped further use of the product."




MSMT-FN: Multi-segment Multi-task Fusion Network for Marketing Audio Classification

arXiv.org Artificial Intelligence

Audio classification plays an essential role in sentiment analysis and emotion recognition, especially for analyzing customer attitudes in marketing phone calls. Efficiently categorizing customer purchasing propensity from large volumes of audio data remains challenging. In this work, we propose a novel Multi-Segment Multi-Task Fusion Network (MSMT-FN) that is uniquely designed for addressing this business demand. Evaluations conducted on our proprietary MarketCalls dataset, as well as established benchmarks (CMU-MOSI, CMU-MOSEI, and MELD), show MSMT-FN consistently outperforms or matches state-of-the-art methods. Additionally, our newly curated MarketCalls dataset will be available upon request, and the code base is made accessible at GitHub Repository MSMT-FN, to facilitate further research and advancements in audio classification domain.


H-Infinity Filter Enhanced CNN-LSTM for Arrhythmia Detection from Heart Sound Recordings

arXiv.org Artificial Intelligence

Early detection of heart arrhythmia can prevent severe future complications in cardiac patients. While manual diagnosis still remains the clinical standard, it relies heavily on visual interpretation and is inherently subjective. In recent years, deep learning has emerged as a powerful tool to automate arrhythmia detection, offering improved accuracy, consistency, and efficiency. Several variants of convolutional and recurrent neural network architectures have been widely explored to capture spatial and temporal patterns in physiological signals. However, despite these advancements, current models often struggle to generalize well in real-world scenarios, especially when dealing with small or noisy datasets, which are common challenges in biomedical applications. In this paper, a novel CNN-H-Infinity-LSTM architecture is proposed to identify arrhythmic heart signals from heart sound recordings. This architecture introduces trainable parameters inspired by the H-Infinity filter from control theory, enhancing robustness and generalization. Extensive experimentation on the PhysioNet CinC Challenge 2016 dataset, a public benchmark of heart audio recordings, demonstrates that the proposed model achieves stable convergence and outperforms existing benchmarks, with a test accuracy of 99.42% and an F1 score of 98.85%.


Estimating Respiratory Effort from Nocturnal Breathing Sounds for Obstructive Sleep Apnoea Screening

arXiv.org Artificial Intelligence

Obstructive sleep apnoea (OSA) is a prevalent condition with significant health consequences, yet many patients remain undiagnosed due to the complexity and cost of over-night polysomnography. Acoustic-based screening provides a scalable alternative, yet performance is limited by environmental noise and the lack of physiological context. Respiratory effort is a key signal used in clinical scoring of OSA events, but current approaches require additional contact sensors that reduce scalability and patient comfort. This paper presents the first study to estimate respiratory effort directly from nocturnal audio, enabling physiological context to be recovered from sound alone. We propose a latent-space fusion framework that integrates the estimated effort embeddings with acoustic features for OSA detection. Using a dataset of 157 nights from 103 participants recorded in home environments, our respiratory effort estimator achieves a concordance correlation coefficient of 0.48, capturing meaningful respiratory dynamics. Fusing effort and audio improves sensitivity and AUC over audio-only baselines, especially at low apnoea-hypopnoea index thresholds. The proposed approach requires only smartphone audio at test time, which enables sensor-free, scalable, and longitudinal OSA monitoring.


A Appendix

Neural Information Processing Systems

A.1 Self-supervised loss formula Wav2vec 2.0, when trained in a self-supervised way, uses a loss ( L) which is the weighted combination of two losses: one diversity loss ( L Then, we use nistats [Abraham et al., 2014] compute_regressor function with the'glover' model to temporally convolve ( h R To address this issue, [Pasad et al., 2021] explored the encoding of local acoustic features, phone identity, word identity and word meaning across layers. Similarly, [Millet et al., 2021] compared representations 17 to human behavioural data to assess whether they better captured listeners' perception of higher-level phonemic properties or of lower-level subphonemic properties of speech stimuli. Finally, [V aidya et al., 2022] recent study explores filter banks, spectrograms, phonemes and words across layers. Here, we complement these analyses by showing that self-supervised learning allows wav2vec 2.0 to learn represents, along its hierarchy the representations of MEL spectrograms, phonetic categories and word embeddings (Figure S1). We study the following features: the MEL spectrogram of the audio, computed using librosa (d=128) the phonemes (categorical features).


SPIRA: Building an Intelligent System for Respiratory Insufficiency Detection

arXiv.org Artificial Intelligence

Respiratory insufficiency is a medic symptom in which a person gets a reduced amount of oxygen in the blood. This paper reports the experience of building SPIRA: an intelligent system for detecting respiratory insufficiency from voice. It compiles challenges faced in two succeeding implementations of the same architecture, summarizing lessons learned on data collection, training, and inference for future projects in similar systems.


Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling

arXiv.org Artificial Intelligence

We introduce an extensive new dataset of MIDI files, created by transcribing audio recordings of piano performances into their constituent notes. The data pipeline we use is multi-stage, employing a language model to autonomously crawl and score audio recordings from the internet based on their metadata, followed by a stage of pruning and segmentation using an audio classifier. The resulting dataset contains over one million distinct MIDI files, comprising roughly 100,000 hours of transcribed audio. We provide an in-depth analysis of our techniques, offering statistical insights, and investigate the content by extracting metadata tags, which we also provide. Central to the success of deep learning as a paradigm has been the datasets used to train neural networks. With the rapid technical advancements and ever-increasing availability of computational power, music has become a popular target for deep learning research, and deep learning in turn has had a notable impact on the study and creation of musical works (Briot et al., 2019). The progress of music-oriented deep learning depends heavily on access to diverse, well-structured datasets. Music is inherently structured and can be represented computationally in a variety of forms (Wiggins, 2016). In this work, we focus on symbolic representations of music, such as MIDI (Musical Instrument Digital Interface), which are widely used for encoding, analyzing, and facilitating the generation of musical compositions by both humans and machines (Ji et al., 2023).